Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move machine-based options from config.base to host files #3053

Open
wants to merge 15 commits into
base: develop
Choose a base branch
from

Conversation

DavidHuber-NOAA
Copy link
Contributor

Description

This moves all machine-specific options to the workflow/hosts files from the config.* files.

This also turns HPSS archiving on for WCOSS2 by default.

Resolves #2942

Type of change

  • Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO
  • Does this change require an update to any of the following submodules? NO

How has this been tested?

CI testing on Hercules. The tracker and genesis jobs did not run (as expected).
WCOSS2 testing still needs to be performed.

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have documented my code, including function, input, and output descriptions
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added

parm/config/gfs/config.prepoceanobs Fixed Show fixed Hide fixed
parm/config/gfs/config.prepoceanobs Fixed Show fixed Hide fixed
@DavidHuber-NOAA
Copy link
Contributor Author

This will break the C96_atm3DVar_extended test case until a fix is made for the gfs_downstream.tar file (as mentioned in #3019).

@DavidHuber-NOAA
Copy link
Contributor Author

There is an issue with the gfs_bufrsnd job that is preventing it from creating the output gfs_collective BUFR files. I am re-disabling HPSS archiving on WCOSS2 for the time being and will add this information to #3019.

Marking ready for review.

@DavidHuber-NOAA DavidHuber-NOAA added CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress CI-Wcoss2-Failed **Bot use only** CI testing on WCOSS for this PR has failed and removed CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress labels Nov 1, 2024
@DavidHuber-NOAA DavidHuber-NOAA removed the CI-Wcoss2-Failed **Bot use only** CI testing on WCOSS for this PR has failed label Nov 4, 2024
Copy link
Collaborator

@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a commendable PR and a key demonstration of how to refactor an "if-y diff-y" system specific configuration construct into a more centralized and pythonic approach.  I particularly liked how you enhanced the formation of the defaulting overrides.

@DavidHuber-NOAA DavidHuber-NOAA added the CI-Hercules-Ready **CM use only** PR is ready for CI testing on Hercules label Nov 5, 2024
@emcbot emcbot added CI-Hercules-Building **Bot use only** CI testing is cloning/building on Hercules CI-Hercules-Running **Bot use only** CI testing on Hercules for this PR is in-progress and removed CI-Hercules-Ready **CM use only** PR is ready for CI testing on Hercules CI-Hercules-Building **Bot use only** CI testing is cloning/building on Hercules labels Nov 5, 2024
Copy link
Contributor

@WalterKolczynski-NOAA WalterKolczynski-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor things and one thing to sneak in (or leave for another time).

workflow/hosts/wcoss2.yaml Outdated Show resolved Hide resolved
parm/config/gefs/config.base Outdated Show resolved Hide resolved
workflow/setup_expt.py Outdated Show resolved Hide resolved
workflow/setup_expt.py Show resolved Hide resolved
@emcbot emcbot removed the CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera label Nov 6, 2024
@emcbot
Copy link

emcbot commented Nov 6, 2024

Experiment C96C48_hybatmaerosnowDA FAILED on Hera in Build# 3 with error logs:

/scratch1/NCEPDEV/global/CI/3053/RUNTESTS/COMROOT/C96C48_hybatmaerosnowDA_2a89ef85/logs/2021122012/gdas_fcst_seg0.log

Follow link here to view the contents of the above file(s): (link)

@emcbot emcbot added CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed and removed CI-Hera-Running **Bot use only** CI testing on Hera for this PR is in-progress labels Nov 6, 2024
@emcbot
Copy link

emcbot commented Nov 6, 2024

Experiment C96C48_hybatmaerosnowDA FAILED on Hera in Build# 3 in
/scratch1/NCEPDEV/global/CI/3053/RUNTESTS/EXPDIR/C96C48_hybatmaerosnowDA_2a89ef85

@emcbot
Copy link

emcbot commented Nov 6, 2024

Experiment C48_ATM FAILED on Hera in Build# 3 with error logs:

/scratch1/NCEPDEV/global/CI/3053/RUNTESTS/COMROOT/C48_ATM_2a89ef85/logs/2021032312/gfs_arch.log

Follow link here to view the contents of the above file(s): (link)

@emcbot
Copy link

emcbot commented Nov 6, 2024

Experiment C96_S2SWA_gefs_replay_ics FAILED on Hera in Build# 3 with error logs:

/scratch1/NCEPDEV/global/CI/3053/RUNTESTS/COMROOT/C96_S2SWA_gefs_replay_ics_2a89ef85/logs/2020110100/arch.log

Follow link here to view the contents of the above file(s): (link)

@emcbot
Copy link

emcbot commented Nov 6, 2024

Experiment C48_ATM FAILED on Hera in Build# 3 in
/scratch1/NCEPDEV/global/CI/3053/RUNTESTS/EXPDIR/C48_ATM_2a89ef85

@emcbot
Copy link

emcbot commented Nov 6, 2024

Experiment C96_S2SWA_gefs_replay_ics FAILED on Hera in Build# 3 in
/scratch1/NCEPDEV/global/CI/3053/RUNTESTS/EXPDIR/C96_S2SWA_gefs_replay_ics_2a89ef85

@emcbot
Copy link

emcbot commented Nov 6, 2024

Experiment C48mx500_3DVarAOWCDA FAILED on Hera in Build# 3 with error logs:

/scratch1/NCEPDEV/global/CI/3053/RUNTESTS/COMROOT/C48mx500_3DVarAOWCDA_2a89ef85/logs/2021032418/gdas_arch.log

Follow link here to view the contents of the above file(s): (link)

@emcbot
Copy link

emcbot commented Nov 6, 2024

Experiment C96_atm3DVar FAILED on Hera in Build# 3 with error logs:

/scratch1/NCEPDEV/global/CI/3053/RUNTESTS/COMROOT/C96_atm3DVar_2a89ef85/logs/2021122100/gdas_arch.log

Follow link here to view the contents of the above file(s): (link)

@emcbot
Copy link

emcbot commented Nov 6, 2024

Experiment C96_atm3DVar FAILED on Hera in Build# 3 in
/scratch1/NCEPDEV/global/CI/3053/RUNTESTS/EXPDIR/C96_atm3DVar_2a89ef85

@emcbot
Copy link

emcbot commented Nov 6, 2024

Experiment C48mx500_3DVarAOWCDA FAILED on Hera in Build# 3 in
/scratch1/NCEPDEV/global/CI/3053/RUNTESTS/EXPDIR/C48mx500_3DVarAOWCDA_2a89ef85

@emcbot
Copy link

emcbot commented Nov 6, 2024

Experiment C96C48_hybatmDA FAILED on Hera in Build# 3 with error logs:

/scratch1/NCEPDEV/global/CI/3053/RUNTESTS/COMROOT/C96C48_hybatmDA_2a89ef85/logs/2021122100/gdas_arch.log

Follow link here to view the contents of the above file(s): (link)

@emcbot
Copy link

emcbot commented Nov 6, 2024

Experiment C96C48_hybatmDA FAILED on Hera in Build# 3 in
/scratch1/NCEPDEV/global/CI/3053/RUNTESTS/EXPDIR/C96C48_hybatmDA_2a89ef85

@emcbot
Copy link

emcbot commented Nov 6, 2024

Experiment C96C48_ufs_hybatmDA FAILED on Hera in Build# 3 with error logs:

/scratch1/NCEPDEV/global/CI/3053/RUNTESTS/COMROOT/C96C48_ufs_hybatmDA_2a89ef85/logs/2024022400/gdas_arch.log

Follow link here to view the contents of the above file(s): (link)

@emcbot
Copy link

emcbot commented Nov 6, 2024

Experiment C96C48_ufs_hybatmDA FAILED on Hera in Build# 3 in
/scratch1/NCEPDEV/global/CI/3053/RUNTESTS/EXPDIR/C96C48_ufs_hybatmDA_2a89ef85

@aerorahul
Copy link
Contributor

Experiment C96_S2SWA_gefs_replay_ics FAILED on Hera in Build# 3 with error logs:

/scratch1/NCEPDEV/global/CI/3053/RUNTESTS/COMROOT/C96_S2SWA_gefs_replay_ics_2a89ef85/logs/2020110100/arch.log

Follow link here to view the contents of the above file(s): (link)

HPSS is down for maintenance today (11/6) and rest of the week.

@WalterKolczynski-NOAA
Copy link
Contributor

Experiment C96_S2SWA_gefs_replay_ics FAILED on Hera in Build# 3 with error logs:

/scratch1/NCEPDEV/global/CI/3053/RUNTESTS/COMROOT/C96_S2SWA_gefs_replay_ics_2a89ef85/logs/2020110100/arch.log

Follow link here to view the contents of the above file(s): (link)

HPSS is down for maintenance today (11/6) and rest of the week.

Next two days are only partial read outages, which I think means the tests should work after today.

@emcbot
Copy link

emcbot commented Nov 6, 2024

Experiment C48_S2SW FAILED on Hera in Build# 3 with error logs:

/scratch1/NCEPDEV/global/CI/3053/RUNTESTS/COMROOT/C48_S2SW_2a89ef85/logs/2021032312/gfs_arch.log

Follow link here to view the contents of the above file(s): (link)

@emcbot
Copy link

emcbot commented Nov 6, 2024

Experiment C48_S2SW FAILED on Hera in Build# 3 in
/scratch1/NCEPDEV/global/CI/3053/RUNTESTS/EXPDIR/C48_S2SW_2a89ef85

@emcbot emcbot added CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed and removed CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed labels Nov 6, 2024
@emcbot
Copy link

emcbot commented Nov 6, 2024

CI Failed on Hera in Build# 3
Built and ran in directory /scratch1/NCEPDEV/global/CI/3053


Experiment C96C48_hybatmaerosnowDA_2a89ef85 Terminated with 0
FAIL
FAIL tasks failed and 1 dead at Wed Nov  6 17:11:03 UTC 2024
Experiment C96C48_hybatmaerosnowDA_2a89ef85 Terminated: *FAIL*
Error logs:
/scratch1/NCEPDEV/global/CI/3053/RUNTESTS/COMROOT/C96C48_hybatmaerosnowDA_2a89ef85/logs/2021122012/gdas_fcst_seg0.log
Experiment C48_ATM_2a89ef85 Terminated with 0
FAIL
FAIL tasks failed and 1 dead at Wed Nov  6 18:17:45 UTC 2024
Experiment C48_ATM_2a89ef85 Terminated: *FAIL*
Error logs:
/scratch1/NCEPDEV/global/CI/3053/RUNTESTS/COMROOT/C48_ATM_2a89ef85/logs/2021032312/gfs_arch.log
Experiment C96_S2SWA_gefs_replay_ics_2a89ef85 Terminated with 0
FAIL
FAIL tasks failed and 1 dead at Wed Nov  6 18:17:47 UTC 2024
Experiment C96_S2SWA_gefs_replay_ics_2a89ef85 Terminated: *FAIL*
Error logs:
/scratch1/NCEPDEV/global/CI/3053/RUNTESTS/COMROOT/C96_S2SWA_gefs_replay_ics_2a89ef85/logs/2020110100/arch.log
Experiment C48mx500_3DVarAOWCDA_2a89ef85 Terminated with 0
FAIL
FAIL tasks failed and 1 dead at Wed Nov  6 18:30:02 UTC 2024
Experiment C48mx500_3DVarAOWCDA_2a89ef85 Terminated: *FAIL*
Experiment C96_atm3DVar_2a89ef85 Terminated with 0
FAIL
FAIL tasks failed and 1 dead at Wed Nov  6 18:30:04 UTC 2024
Experiment C96_atm3DVar_2a89ef85 Terminated: *FAIL*
Error logs:
/scratch1/NCEPDEV/global/CI/3053/RUNTESTS/COMROOT/C48mx500_3DVarAOWCDA_2a89ef85/logs/2021032418/gdas_arch.log
Error logs:
/scratch1/NCEPDEV/global/CI/3053/RUNTESTS/COMROOT/C96_atm3DVar_2a89ef85/logs/2021122100/gdas_arch.log
Experiment C96C48_hybatmDA_2a89ef85 Terminated with 0
FAIL
FAIL tasks failed and 1 dead at Wed Nov  6 18:30:07 UTC 2024
Experiment C96C48_hybatmDA_2a89ef85 Terminated: *FAIL*
Error logs:
/scratch1/NCEPDEV/global/CI/3053/RUNTESTS/COMROOT/C96C48_hybatmDA_2a89ef85/logs/2021122100/gdas_arch.log
Experiment C96C48_ufs_hybatmDA_2a89ef85 Terminated with 0
FAIL
FAIL tasks failed and 1 dead at Wed Nov  6 18:48:24 UTC 2024
Experiment C96C48_ufs_hybatmDA_2a89ef85 Terminated: *FAIL*
Error logs:
/scratch1/NCEPDEV/global/CI/3053/RUNTESTS/COMROOT/C96C48_ufs_hybatmDA_2a89ef85/logs/2024022400/gdas_arch.log
Experiment C48_S2SWA_gefs_2a89ef85 Completed 1 Cycles: *SUCCESS* at Wed Nov  6 19:31:31 UTC 2024
Experiment C48_S2SW_2a89ef85 Terminated with 0
FAIL
FAIL tasks failed and 1 dead at Wed Nov  6 20:01:33 UTC 2024
Experiment C48_S2SW_2a89ef85 Terminated: *FAIL*
Error logs:
/scratch1/NCEPDEV/global/CI/3053/RUNTESTS/COMROOT/C48_S2SW_2a89ef85/logs/2021032312/gfs_arch.log

@emcbot emcbot added CI-Hercules-Passed **Bot use only** CI testing on Hercules for this PR has completed successfully and removed CI-Hercules-Running **Bot use only** CI testing on Hercules for this PR is in-progress labels Nov 6, 2024
@emcbot
Copy link

emcbot commented Nov 6, 2024

CI Passed on Hercules in Build# 2
Built and ran in directory /work2/noaa/global/CI/HERCULES/3053


Experiment C48_ATM_2a89ef85 Completed 2 Cycles: *SUCCESS* at Wed Nov  6 12:10:53 CST 2024
Experiment C96_S2SWA_gefs_replay_ics_2a89ef85 Completed 1 Cycles: *SUCCESS* at Wed Nov  6 12:29:06 CST 2024
Experiment C96_atm3DVar_2a89ef85 Completed 3 Cycles: *SUCCESS* at Wed Nov  6 13:23:49 CST 2024
Experiment C96C48_hybatmDA_2a89ef85 Completed 3 Cycles: *SUCCESS* at Wed Nov  6 13:24:00 CST 2024
Experiment C48_S2SW_2a89ef85 Completed 2 Cycles: *SUCCESS* at Wed Nov  6 14:00:13 CST 2024
Experiment C48_S2SWA_gefs_2a89ef85 Completed 1 Cycles: *SUCCESS* at Wed Nov  6 14:07:19 CST 2024

@DavidHuber-NOAA
Copy link
Contributor Author

Hera failed in part due to HPSS being down, but also because the machine-specific AERO_INPUTS_DIR was not being populated, which caused the C96C48_hybatmaerosnowDA forecast to fail. I will relaunch tests tomorrow after verifying data can be pushed to HPSS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed CI-Hercules-Passed **Bot use only** CI testing on Hercules for this PR has completed successfully
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move all machine-specific options to workflow/hosts files
5 participants